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Abstract 

Traditional analyses of capture-recapture data are based on likelihood functions 
that explicitly integrate out all missing data. We use a complete data likelihood 
(CDL) to show how a wide range of capture-recapture models can be easily fitted 
using readily available software JAGS/BUGS even when there are individual-specific 
time- varying covariates. The models we describe extend those that condition on 
first capture to include abundance parameters, or parameters related to abundance, 
such as population size, birth rates or lifetime. The use of a CDL means that any 
missing data, including uncertain individual covariates, can be included in models 
without the need for customized likelihood functions. This approach also facilitates 
modeling processes of demographic interest rather than the complexities caused by 
non-ignorable missing data. We illustrate using two examples, (i) open population 
modeling in the presence of a censored time-varying individual covariate in a full 
robust-design, and (ii) full open population multi-state modeling in the presence of 
a partially observed categorical variable. 
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1 Introduction 



An important feature of capture 



eluding individua l -spec i fic ones (|Lebreton 



2006 



King et al 



2008 



recapture modeling is the ability to include covariates, in 



et a 



Catchpole et al 



2008 



1992 



Schwarz 



Bonner et al. 



et al 



1993 



Bonner and Schwarz 



2010). The development of 



methods for including individual covariates has focused on models that condition on the 
first capture of each individual. A consequence is that likelihood based inference is re- 
stricted to statements about survival or recapture probabilities and related quantities. 
Importantly, these models do not include abundance parameters (or parameters related 
to abundance, such as population growth rates or stopover time) in the likelihood. In- 



stead, in 



(IHuggins 



erence about abundance has re 



1989 



McDonald and Amstrup 



ied on ad-hoc Horvitz-Thompson-type approaches 



20011). 



Individual-specific time-varying covariates pose problems in models based on the full 
likelihood as the covariate values cannot be observed for individuals that were available 
for capture but were not caught. In order to fit these models by maximum likelihood, 
we would need to integrate out the missing covariate values for the unseen individuals 
for each possible value of abundance, as well as integrating out the missing values for the 
individuals we observed. Explicit integration of the likelihood leads to the observed data 
likelihood (ODL), but this integration is often difficult to do in practice. An alternative 
approach is to model in terms of the complete data likelihood (CDL), where specialized 



computational algorithms, such as the Gibbs sampl er or the EM alg o rithm perform the re- 



quired 



(12008 



integ ration during the model fitting process. 



Schofieldl (120071 ) 



Schofield and Barker 



20091 1 lay out a unified framework for capture- recapture modeling using the CDL, 
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which while flexible enough to include individual-specific time-varying covariates, still re- 
quires the user to construct these algorithms in order to fit the model. The same process 
is r equired when including in dividual cova riates in closed population studies, for example, 



sec 



King and Brooksl (120081) 



Royjd (120091 ). 

To date, no user-friendly software is available for full open population modeling in 
the presence of individual-specific time-varying continuous covariates. Algorithms do 
exist for other individual-specific open population models, including user- written algo- 



rithms for the multi-state model ( IDupuis and Schwarz 



20071 ) and models where surviva l 



and fecundity parameters depend on population abundance (jSchofield and Barker 



20081 ). 



Unfortunately, these algorithms require custom-written softw are and lack ge neralizabil 



i ty. Another ap proach is to include these models in JAGS (jPlummer 



f lLunn et al 



20031 ) or BUGS 



20001 ). soft ware that fits mode ls 



ingly used by eco l ogists 



Link and Barker (20 



in 



Rovle and Doraziol (120081 ) 



Royle et al. 



(120071 ) 



using the Gibb s samp l er and is being i ncreas - 



Gimenez et al 



(120091 ) 



Schofield et al. 



(120091 ). 



0). T 



li s has been done f or ind ividual-specific mixed effects models 



Link and Barkerl (120101 ). 



Here we consider models for two datasets, bot h of which hay e indiv idual-specific time- 



Nichols et al. 



( 119921 ). where the robust 



varying covariates. The first is an example from 
design is used to sample meadow voles, Microtus pennsylvanicus, with the body mass 
measured every time there was a capture. The body mass measurements were discretized 
in order to fit a multi-state model, conditional on first capture, to understand how the 



covariate body mass changed through time. This was later refined by 



Bonner and Schwarz 



( 120061 ) who model body mass as an individual-specific time-varying covariate, although, 
their analysis ignored the robust design and conditioned on first capture. Other analyses 
have used th e robust design to esti mate abundance, but did this without taking into account 



body mass (I Williams et al. 



2002 



Pg. 525). Here we use information on the body mass 



within the robust design to estimate abundance. 



The second dataset is a study of conjunctivitis in house finch, Carpodacus mexicanus, 
where the covariate of interest is disease status, a covariate that is not always observed. 



Conn and Coochl (120091 ) model the data using an extension to the multi-state model to 
account for the partially observed data, however, they only model conditional on first 
capture. Even though they are able to obtain estimates of survival for each disease class 
and movement between the two disease classes, they were unable to get an predictions of 
the population in each disease class th rough tim e. 



Here we show how the framework o 
can be used to extend the models of 



Schofield (2007) 



Rovle et al 



Schofield and Barker 



(I2007h 



Rovle and Dorazio 



(2008 



20091) 



(120081 ) and 



Link and Barker! (120101 ) to fit full open population models with individual-specific time- 
varying covariates in BUGS. There are only minor differences in the model and BUGS code 
for the two examples, despite the differences between the datasets: (i) continuous vs. cat- 
egorical covariate, (ii) robust design vs. standard capture-recapture design, (iii) covariate 
measurement every capture occasion vs. uncertainty in covariate measurements on some 
capture occasions. This shows the power and flexibility of the modeling approach, since 
the two examples include as special cases: the multi-state model, multi-event type models, 
individual-specific time-varying continuous covariates, and lifetime duration models such 
as the stop-over model. 



2 Model Framework 



Capture-recapture models involve complex missing data mechanisms. Traditional ap- 



proaches to inference focus on deriving the likelihood for the observed data (t 
by int e grating over all missing data. Inst ead, we use the modeling fra mework of 



he ODD 



(120071) 



Schofield and Barker! ( 2008 



20091 ) that uses data augmentation ( jTanner and Wong 



Schofield 



19871 ) to allow us to model in terms of the complete data likelihood (CDL). Similar ideas 
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have also been proposed bv lRovle and Doraziol ((2008). 



The likelihood we use for inference is in terms of the complete data, which for a capture- 
recapture study with individual-specific time-varying covariate data are the (i) times of 
birth, (ii) times of death, and (iii) complete covariate values for each individual ever avail- 
able for capture. The main advantage of using this likelihood over the ODL is that we 
are able to focus on modeling the processes of interest rather than having to account for 
the complexities caused by missing data that result from sampling methods. Importantly, 
in adopting the CDL approach to inference, we do not need to make any additional as- 
sumptions to those made when using the ODL. It is simply a reformulation of the model 
in terms of the easier-to-understand CDL where we use computational algorithms, such as 
Markov chain Monte Carlo (MCMC) or the expectation-maximization (EM) algorithm, to 
integrate over all missing data. 

We write the CDL for the capture-recapture model including birth and covariates as 

,b\nb Arl \„d\„b nd \;] W\ h „d nX 



[a b \9 b , N] [a d \a b ,9 d , N}[X\a b , a d , 9 X , N^ [z\9 z , N], 

Birth Mortality Capture Covariate 



where is the probability (density) of the random variable Y given X; 

a b j = means that individual i has yet to be born at, or before, sampling period j, with 
a b , = 1 otherwise: 

afj = 1 means that individual i has yet to die at sampling period j, with afj = otherwise; 
Xij = 1 means that individual % was caught in sampling period j, with X^ = otherwise; 
Zij is the covariate value for individual % in sampling period j; 

6 b are parameters describing the birth process, 9 d are parameters describing the mortality 
process, 9 X are parameters describing the capture process, 9 Z are parameters of the covari- 
ate distribution and iV is the total number of individuals available for capture during the 



study period (which is distinct from Nj, the number of individuals alive in the jth sam- 
pling occasion). The covariate z can be used to help model the birth, death and covariate 
processes, although we assume that this is specified through the models for 9 b , 9 d and 9 X . 
In order for inference to be valid, we must ensure that these models do not violate the laws 
of conditional probability, for example, by assuming that survival probability depends on 
the covariate value at the end of the period. 

Most demographic summaries of interest are obtained as derived quantities of a d and 
a b . For example, the population size in each sampling period, Nj, and the "lifetime" for 
each individual Aj are 

N 
k 

Other potential quantities of interest we could specify include the number of births and 
deaths between each sampling period. 

The choice of model for each of the components in (JTJ) will depend on the data, and the 
assumptions we are willing to make. For a standard capture-recapture study design, we 
present some common models for each component. We leave the parameters for mortality 
and capture to be individual specific as these are being modeled in terms of individual- 
specific covariates. 

2.1 The Birth Component 

One possible model for the birth components is 

N k 

[a b \e\ n] = nki|0 6 ] IK-k 6 !' • • • > <•-!> ^ 

i=l j=2 
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N 



0-1 



3-1 



n^ern(Cx) J[Bem - a\ h )C 3 + 1 - JJC 1 



(2) 



i=i 



3=2 



\h=l 



h=l 



where Bern(p) denotes a Bernoulli distribution with parameter p and we set (k — 1- We 
include the term nQ(l - 4J0 + 1 " IK=i(1 — a ih) to ensure that an individual can 
only be born once. The value £i is the probability of being born before the study began, 
with the values Cj+i defined as the probability of being born between sample j and j + 1 
conditional on (i) not being born before j, and (ii) being at risk of capture at some point 
during the study. A possible reparameterization is 



I 

This gives the birth formulation of ISchwarz and Arnasonl (119961 ) where /3,- is the probability 



of being born between sample j and j + 1 conditional on being at risk of capture at some 
point during the study. Neither Q nor j3j are meaningful birth parameters, since they are 
defined in terms of the study /sampling process, as for example, a change in k changes the 
parameter values. A more natural parameterization is to use per-capita birth rates, 



Vj 



, j = 1, ...,k-l. 



This gives the birth formulation used in 



Schofield and Barker! ( 120081 ). where r]j is the ex- 



pected number of births between sampling period j and j + 1 for each individual in the 



population at sampling peri od 7 condition s 



birth parameters, fj, used bv IPradell ( 119961 ) 



on N. These parame ters are similar to the 



Link and Barker! ( 120051 ) ; the difference is that 



the denominator they use is E[iVj|iV] instead of Nj. 
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2.2 The Mortality Component 

We factor the component for mortality as, 



N k 



[a d \a b ,6 d ,N} = UUK\4-v<^ dd \ 

i=l j=2 
N k 

II II Berniaf^M^Ai-i + (1 - <-i)))> ( 3 ) 



t=l J=2 



where the parameter SV,- is the probability of individual i surviving between sampling period 
j and j + 1. The term af J _ 1 (a^_ 1 5'ij + (1 — is required to ensure that an individual 

can only (i) die after being born, and (ii) live once. 



2.3 The Capture Component 

We factor the component for capture as, 



N k 



[X\a b , a d , 9 X , N] = I][*<K> 4>° 

i=i j=i 

N k 

= Hl[Bern(a^p tJ ), (4) 

i=\ j=i 

where Pij is the probability of capture for individual i in sampling period j. The term 
afjdij is required to ensure that an individual is only available for capture while it is alive. 



2.4 Additions Required For BUGS/ JAGS 

In order to make inference about the parameters in t he model, we a dopt a Bayesian ap- 



proach and fit all examples using the software JAGS 



Plummer 



20031 ) . with model, data, 



initial values and script files available at www. maths . otago . ac .nz/~rbarker/BUGS. We 



use JAGS for the following examples due to superior convergence in trial runs of the al- 
gorithms as compared to OpenBUGS. The modeling language of JAGS is nearly identical 



to BUGS flLunn et all 1200(1 ) and is able to be called from R f lSu and Yaiimal 120091 ). We 
describe the main differences between BUGS and JAGS and how to specify the data and 
the initial values in the supplementary materials. 



Neither JAGS nor BUGS allow stochastic indices, such as N, as was used in lSchofield and Barker 



(2008 1 ). Instead, we must use a computational trick to include N. The trick is given in 
Durban and Elstonl ( 120051 ) and involves specifying M, an upper boun d for N . An alter- 



Rovle et al. 



( I2007h . They also 



nate, yet mathematically equivalent approach is given in 
specify M but then reparameterize the model in terms of an incomplete indicator variable, 
w, instead of N. The value Wi = 1 means that individual % was at risk of capture during 
the study and Wi = otherwise, with ^2 i=1 w,i = N. Having used both approaches, we find 



them practically equivalent with the exception that the specification of 



Royle et al. 



(120071 ) 



no longer has N avai 
than the approach of 



able for hierarchica l mode ling, but does generally run slightly faster 



Durban and Elstonl (120051 ) . Since we have no desi re to include a hier- 



Royle et al. 



( 120071 ) 



archical model for N in the examples we explore, we use the approach of 
to make use of the faster algorithm. For readability, we do not change the CDL in (pQ) to 
reflect this additional likelihood component and continue to write the CDL conditioning 
on N. 

Another difficulty is that attempting to use the per-capita birth rate (rjj) parameteri- 
zation in JAGS or BUGS results in code that is currently impractically slow to run. Here 
we use the less natural Q parame terization described above. This is not a limitation of 



MCMC or the Gibbs sampler, as 



Schofield and Barkerl (120081 ) used the per-capita birth 



rates. Instead it is a current limitation of JAGS and BUGS. 



3 Example: Meadow Voles 



Nichols et al. 



(Il992l ) report a capture-recapture study of meadow voles, Microtus pennsyl- 
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vanicus, using a robust design f lPollocki Il982h with 173 individuals caught over 6 primary 
periods each with 5 secondary samples. Every time an individual was caught the mass of 
the animal was recorded to the nearest integer. The scales used to measure mass had a 



ma ximum at 60 grams, 



with m any individuals censored. 



Bonner and Schwarz 



(120061 ) collapsed the data across the secondary periods, and allo- 



cated a single measurement to the observed body mass. Using these data they extended 
the Cormack-Joll y-Seber model (CJS) t o include an individual-specific time-varying con- 
tinuous covariate. ISchofield et al.l (120091) u s ed the sa me data to show how to fit this model 



using BUGS (neither 



Bonner and Schwarzl (120061 ) nor 



Schofield et al. 



( 120091 ) accounted for 



the censored data). Here we extend this model to include the full robust design and make 
use of all information on body mass. Including the birth process in the model allows us to 
estimate the population size of the meadow vole in the presence of the individual-specific 
time-varying continuous covariate. 

The CDL is given in (jTJ, with the birth component given in ([2]) and the mortality 
component given in ([3]). To include the robust design, we redefine X to be an array with 
Xiji = 1 if individual i is caught in primary period j and secondary period I. We extend 
the capture component in (j4j) to include both the k\ primary periods and the &2j secondary 
periods, 



N ki k 2j 

\x\a\ a\ e x , n] = nn n x > <■> < > 9 

i=i j=i 1=1 

N ki k 2j 

=nnri 5era K<^)' 

i=i j=i 1=1 

We also redefine z to be an array with zyj being the body mass for individual % in 
primary period j and secondary period I. Since every observed body mass value was either 
censored or rounded we treat z^i as missing for every individual in every sampling period 
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and denote the observed masses by We specify the model for z as 

N ki k 2j 

i=l j=bi 1=1 
N fei k 2 j 

=nnri Ar ( A ^^) / fe'^7), 

i=l 3=fi «=1 

where N(fi,a 2 ) denotes a Normal distribution with mean /i and variance a 2 , bi is the 
first sample individual i was alive and J() is used to include the rounding, censoring and 
truncation, with 



zfif + 0.5 if X ifl = 1 and z°$ ± 60 

i 02ijl 

otherwise I oo otherwise. 



In other words, we assume that during the secondary periods when the population is 
assumed to be closed, mass rema ins constant and any differe nces are attributable to the 




measurement error cr?. We follow 



Bonner and Schwara (120061 ) and model as 



A i6i ~ iV^c^), Xij ~ iV(A ii _ 1 + A^cr^), j = k + 1, . . . ,ki. 

This is a random walk with drift, where the mass of each individual increases, on average, 
Aj grams between primary period j and j ' + 1. 
We model parameters S and p as 

logit(^) = a + aiX'ij + tf, ~ N{0, 
logit(p^)=7o + 7iA; + ^ + eP, rf~N(0 l( r 2 pl ), ej ~ iV(0, a p 2 2 ), 

where A^ is an approximately standardized value of A^. We allow individuals to have sur- 
vival probabilities that depend on their mass, with additional temporal variability, modeled 
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as a random effect. We allow probability of capture to depend on body mass due to allow 
for a sampling strategy that discriminated due to body mass, with additional variability 
within the secondary period and between primary periods, both modeled as a random ef- 
fect. This is equivalent to specifying the closed population model M t as a random effect, 
with the mean varying between the primary periods. The priors for all parameters are 
given in the supplementary materials. 

In order to determine the effect that rounding and censoring have on the results we 
also fit the model with z^i = z°jf when X^i = 1. 

Each of the models was run in JAGS with an adaptive phase of 5000 iterations followed 
by a posterior sample of 20000 iterations. To ensure convergence we run 3 parallel chains 
with differ ent starting values and ch ecked convergence with the Brooks-Gelman-Rubin 



diagnostic (IBrooks and Gelman 



19981 ). We combined the posterior samples from the three 



chains to give a total posterior sample of 60000. 

The results suggest that there is little practical difference between accounting for the 
censoring of the covariate values and simply modeling using the raw observations (figured]). 
While there are differences in the model for mass, particularly in the variances, this does 
not translate to substantial differences in the model for survival or probability of capture. 
While mass appeared to be associated with the probability of capture, with larger animals 
having a higher chance of capture, there is no evidence of body mas s being associated with 



survival. This result differs from 



Bonner and Schwarzl (120061 ) and 



Schofield et al. 



(120091 ) 



who found that body mass was not a ssociated with either c aptur e probability or survival. 



This difference appears to be due to 



Bonner and Schwarz 



(120061 ) compressing the robust 



design into a more standard CJS design. To ensure that this is consistent with the data 
we compared the observed data for individuals caught at least once in any given secondary 
period, with a significant increase in the average body mass as the number of captures 
increases. After adjusting for the effect of body mass on capture probability, the abundance 
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of the meadow vole appears stable, fluctuating between ~ 55 to ~ 85 individuals during 
the study (figure [T]) . 



4 Example: Conjunctivitis in House Finch 



Conn and Coochl (120091 ) used a multi-state model to study conjunctivitis in house finch, 
Carpodacus mexicanus, with 813 individuals caught in 16 samples. A two-state model 
(whether or not an individual had conjunctivitis) was used, with some individuals having 
unknown status. Our approach is to include all missing disease information using data 
augmenta tion and trea ting the disease as a individual-specific time-varying categorical 



covariate (IDupuis 



19951 ). Thus we can use the CDL in ([I]) with disease being the covariate z. 



Since the covariate takes two values, we can examine abundance, or any other demographic 
su mmary, separately for each g roup. 



Dupuis and Schwarzl (120071 ) used the CDL to fit a multi-state model and estimate abun- 
dance. Their approach differs to ours due to different computational algorithms. To im- 
prove the efficiency of their MCMC sampling, they summed over the latent state variables 
Zij when updating N. While this will yield a quicker algorithm, both in terms of mixing 
and time, it lacks generalizability beyond the multi-state model to, say, individual-specific 
time- vary continuous covariates. In contrast, our approach allows us to apply the CDL 
across a range of different models and different covariate distributions, including continuous 
covariates, without major modifications in the JAGS/BUGS code. 

Since many individuals have uncertain disease status, even whe n encountered , we must 



19761 ). Here we 



consider assumptions about the missingness of these observations (jRubinl 
assume that they are either missing completely at random or missing at random. In either 
case, the process that describes how the data go missing does not need to be included in 



the model. In the supplementary materials we describe and fit the model where we assume 
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that the additional missing data is missing not at random. The results are very similar to 
those from the model assuming the additional missing data is missing at random. 

The CDL is given in ([T]) and we specify the birth component as in (j2j), the death 
component as in ([3]) and the capture component as in (J4]). The only component left to 
specify is the covariate, disease. 

We specify the model for disease as 

N k 

[z\9 z ,N]=l[[z ibi \9 z } J] to|zfr_i,0*] 

i=l j=bi+l 
N k 

= Y[Cat((v 1 ,v 2 )) Yl Catdu^uu^tf)) 

i=l j=bi + l 

where Cat(ir) is a categorical distribution with probability vector 7r. The covariate value 
Zij = 2 indicates that individual i does has the disease in sample j and z i3 - = 1 otherwise, v\ 
is the probability of being in state / in the first sample after birth and Uhi is the probability 
of moving from state h to state /. We use a generic categorical distribution (instead of the 
binomial) to show how this model generalizes to more than two states. 
To complete the model specification we model S and p as 

logit(^) = a + aJizij = 2) + 77/, rjf ~ N(0, a 2 s ), 
logit(p fj ) = 70 + 7iA% = 2) + rf 3 , rf 3 ~ ^(0, oj). 

We allow individuals to have survival and capture probabilities that depend on their disease 
status, with temporal variability modeled as a random effect. The priors for all parameters 
are given in the supplementary materials. 

Each of the models was run in JAGS with an adaptive phase of 25000 iterations followed 
by a posterior sample of 25000 iterations. To ensure convergence we run 3 parallel chains 
with different starting values and checked convergence with the Brooks-Gelman-Rubin 
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diagnostic (IBrooks and Gelman 



19981 ). We combined the posterior samples from the three 
chains to give a total posterior sample of 75000. 

The results suggest that disease status is associated with both survival and capture 
probability (figure |2]). Having conjunctivitis lowered the log-odds of survi val, while increas- 



i ng th e log-odds of capture. The results for survival appear to agree with 



Conn and Cooch 



( 120091 ). with no results for capture probability available to compare. The transition proba- 
bilities suggest that it is rare for an individual to develop conjunctivitis, with less than 5% 
of disease free animals contracting the disease (figure [2]). However, once an individual has 
conjunctivitis it is somewhat difficult to become disease free, with around three quarters 
of individuals remaining in the disease state from one sample to the next. The population 
size for diseased animals, while low, appears to be relatively stable with no fewer than 5 
diseased individuals in the population (and as many as 40) during the study (figure [2]). It 
is interesting to note that the population sizes for the diseased and non-diseased states, 
while similar, do not necessarily exhibit the same dynamics through time. In particular, 
one could claim that there are periods where one class increases or decreases while the 
other remains relatively stable. 



5 Discussion 



We have shown how to use the framework of 



Schofieldl (120071 ) 



Schofield and Barkerl ( 2008 



20091 ) to fit using JAGS or BUGS complex models where we estimate demographic sum- 
maries of interest in the presence of individual-specific time-varying covariates. The two 
examples we show include modeling in the presence of covariate uncertainty and including 
different study designs. The CDL conveniently factorizes so that we are able to specify 
separate models for the birth, mortality, capture and covariate processes while fitting the 



joint model in JAGS or BUGS using MCMC methods. This means our focus can move from 
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accounting for the complex sampling process to focusing on specifying biologically mean- 
ingful models for the processes of interest, including hierarchical models for the parameters 
that describe demographic changes in the population. 

The house finch example shows how using the CDL facilitates modeling of missing 
data, including covariate uncertainty. The model is identical to the one where all data 
were actually observed, except that we must now consider, and potentially include, the 
process by which the data go missing. This is in contrast to the ODL where any missing 
data, including covariate uncert ainty, usually requi re sp ecification of a new like lihood. 



Examples of this can be seen in 



Nichols et al. 



( 120041 ) and 



Conn and Coochl ( 120091 ) where 



uncertainty in the covariate requires a new likelihood to be specified in order to include 
the missing data. 

The advantages in using the CDL for modeling comes at a computational cost. In a 
Bayesian setting, all missing values are treated as 'unknowns', which means each missing 
value needs to be updated in every MCMC iteration. Thus the computational burden 
increases with the amount of missing data. With current processor power, we are limited 
in the size of the data sets that we are able to fit. Using JAGS/BUGS, it can soon become 
difficult to fit datasets with either a large number of individuals or many sampling periods. 
To a large extent, these deficiencies can be overcome with user- written, application specific 
code, making use of specia lized algorithms and other computational advantages (such as 



Dupuis and Schwarzl ( 120071 ) for the multi-state model). However, this limits generalizabil- 
ity, with new algorithms needed for each application. These computational limitations, 
while serious, should not be an impediment to use. Advances in algorithms for MCMC 
and continuing increases in computational power mean that we will continue to be able to 
fit bigger and more complex models into the future. 

An issue we have not mentioned is mod el checking. We sug gest that model checking 



be done using posterior predictive checking (I German et al. 



2004 Pg. 159). One approach 
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is to focus on features of the data that are biologically important. For example, we may 
want to ensure that our model explains the difference between the number of individuals 
caught in successive sample occasions. We then generate replicate datasets from the fitted 
model and either (i) visually check, or (ii) specify an appropriate test-statistic, to see if the 
replicates are consist ent with the observed data. An example of a visual check is shown in 

Pg. 164-165) with the speed of light data. 
An alternative approach is to make use of an omnibus test statistic to ensure that our 
fitted model does not generate data that are inconsistent with the data we have observed. 



Approaches that have b een adopted in the Bayes ian mark-recapture literature include use of 



f lGelman et al. 



2004 



the li kelihood function ( iKing and Brooksl |2002| ) or Freeman- Tukey statistic (IBrooks et al. 



2000). 



Goodness-of- fit for mark-recapture models has received considerab l e attention in a fre - 



quentist setting (jPollock et al. 



1985 



Burnham et al. 



1987 



Barker 



1999 



Pradel et al. 



2003|). 



A constructive approach to goodness-of-fit testing can be taken when a multinomial model 
is used that is a member of the full exponential family based on the factorization 

[Data | MSS] x [MSS |0]. 



For a large class of mark-recapture models the term [Data | MSS] has a hypergeometric 
distribution and provides a natural partitioning of the data into test components. As far 
as we are aware an approach based on the predictive di stribution [Data | MSS] has been 



little used in a Bayesian setting (see 



Wright et al. 



20091 . for an exception) but we believe 



that this should be a productive approach to goodness-of-fit assessment. 

A related issue is model selection. We recommend different approaches to model selec- 
tion depending on the objective of the study. If the objective is to learn about the system 
and to generate hypotheses about relationships then we advocate exploring the data and 
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finding models that best fit the data. A num ber of models can be explored and com- 



pared using cross-validation (IHastie et al 



2009) as well as other criteria. If the objective 



is in validating previously generated hypotheses and making inference in the presence of 
model uncertainty, then we advocate exploring posterior model probabilities, or equiva - 



lently Bayes Factors, fo r a small set of scientifically driven models (ILink and Barker 



20061 ). 



Barker and Link! (120101 ) outline an approach to calculating posterior model probabilities 



that uses the output obtained from running the individual models using MCMC. Other 

I 1 

techniques for calculating posterior model probabilities are described in King et al.l (120101 ) . 
Another possible approach to model selectio n involves specifying a hierarchical model, that 



has as special cases, all models considered (j German et al. 



2004 



Pg. 405 - 406). Then, in- 



stead of the usual approach of either including the effect or not including the effect before 
potentially combining the results using model-averaging, we can specify an informative 
prior distribution centered on zero, that can be viewed as a compromise between inclusion 
of the effect (with an approximately flat prior) and exclusion of the effect (with a prior 
with all mass at zero). An example of this approach is when we have a potential time 
effect. Instead of choosing between a model with no time effect, and one with a separate 
and unrelated parameter for each time point, we can, as we do in the two examples here, 
include time as a random effect with the variance estimated from the data. 

The CDL that we used to fit all models here can be extended in a number of ways. We 
expect it to extend naturally to continuous data models. Depending on the observed data, 
we would expect the CDL to remain the same, or at least similar, with the conditional 
likelihood components relating to birth, death, capture and any covariates changing to 
account for continuous time processes. 

Another extension is when we have uncertainty in the tags themselves. An example 
of this is when our "tag" is a DNA profile of the individual. The problem here is that 
uncertainty in the DNA profile is due to various genotyping errors. The only change we 
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require in our CDL is to include the true tag as missing data and i nclude a com p onent 



that describes the corruption of the true tags to the observed tags. IWright et al 



f l2009h 



used this approach to estimate population size in a closed population study. 

Using the CDL, we are able to model complex depe ndencies between individuals in 



the population. For example, 



Schofield and Barkerl ( 120081 ) included density dependence on 



both birth rates and survival probabilities. Another potential example is one where DNA 
information for one individual is able to provide information about another individual, 
such as parents and offspring. Information about the death of parent gives information 
about the time of birth of the offspring and vice versa. The CDL approach is, at least in 
principle, able to include these relationships as well as including potential uncertainty in 
the offspring/parent relationship. 



References 

Barker, R. (1999), "Joint analysis of mark-recapture, resighting and ring-recovery data 
with age-dependence and marking-effect," Bird Study, 46(suppl), S82-91. 

Barker, R. J. and Link, W. A. (2010), "Posterior model probabilities computed from model- 
specific Gibbs output," In Preparation. 

Bonner, S. J., Morgan, B. J. T., and King, R. (2010), "Continuous Covariates in Mark- 
Recapture-Recovery Analysis: A Comparison of Methods," Biometrics, In Press. 

Bonner, S. J. and Schwarz, C. J. (2006), "An Extension of the Cormack-Jolly-Seber Model 
for Continuous Covariates with Application to Microtus pennsylv aniens, ," Biometrics, 
62, 142-149. 

Brooks, S. P., Catchpole, E. A., and Morgan, B. J. T. (2000), "Bayesian Animal Survival 
Estimation," Statistical Science, 15, 357-376. 

19 



Brooks, S. P. and Gelman, A. (1998), "General Methods for Monitoring Convergence of 
Iterative Simulations," Journal of Computational and Graphical Statistics, 7, 434-455. 

Burnham, K. P., Anderson, D. R., White, G. C, Brownie., and Pollock, . K. (1987), 
"Design and analysis methods for fish survival experiments based on release-recapture." 
American Fisheries Society Monograph, 5, 1-437. 

Catchpole, E., Morgan, B., and Tavecchia, G. (2008), "A new method for analysing discrete 
life history data with missing covariate values," Journal of the Royal Statistical Society: 
Series B (Statistical Methodology), 70, 445-460. 

Conn, P. and Cooch, E. (2009), "Multistate capture- recapture analysis under imperfect 
state observation: an application to disease models," Journal of Applied Ecology, 46, 
486-492. 

Dupuis, J. and Schwarz, C. (2007), "A Bayesian approach to the multistate Jolly-Seber 
capture-recapture model," Biometrics, 63, 1015-1022. 

Dupuis, J. A. (1995), "Bayesian estimation of movement and survival probabilities from 
capture-recapture data," Biometrika, 82, 761-772. 

Durban, J. W. and Elston, D. A. (2005), "Mark- Recapture With Occasion and Individual 
Effects: Abundance Estimation Through Bayesian Model Selection in a Fixed Dimen- 
sional Parameter Space," Journal of Agricultural, Biological, and Environmental Statis- 
tics, 10, 291-305. 

Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2004), Bayesian Data Analysis, 
Chapman & Hall/CRC, 2nd ed. 

Gimenez, O., Bonner, S. J., King, R., Parker, R. A., Brooks, S. P., Jamieson, L. E., 
Grosbois, V., Morgan, B. J. T., and Thomas, L. (2009), "WinBUGS for population 

20 



ecologists: Bayesian modeling using Markov Chain Monte Carlo methods," in Modeling 
Demographic Processes in Marked Populations, eds. Thomson, D. L., Cooch, E. C, and 
Conroy, M. J., Springer, vol. 3 of Environmental and Ecological Statistics, pp. 885-918. 

Hastie, T., Tibshirani, R., and Friedman, J. (2009), The Elements of Statistical Learning. 
Second Edition., Springer. 

Huggins, R. (1989), "On the Statistical Analysis of Capture Experiments," Biometrika, 76, 
133 - 140. 

King, R., Brooks, S., and Coulson, T. (2008), "Analyzing Complex Capture-Recapture 
Data in the Presence of Individual and Temporal Covariates and Model Uncertainty," 
Biometrics, 64, 1187-1195. 

King, R. and Brooks, S. P. (2002), "Bayesian model discrimination for multiple strata 
capture-recapture data," Biometrika, 89, 785-806. 

- (2008), "On the Bayesian Estimation of a Closed Population Size in the Presence of 
Heterogeneity and Model Uncertainty," Biometrics, 64, 816-824. 

King, R., Morgan, B. J. T., Gimenez, O., and Brooks, S. P. (2010), Bayesian Analysis for 
Population Ecology, Chapman & Hall/CRC. 

Lebreton, J. D., Burnham, K., Clobert, J., and Anderson, D. (1992), "Modelling survival 
and testing biological hypotheses using marked animals: A unified approach with case 
studies," Ecological Monographs, 62, 67-118. 

Link, W. A. and Barker, R. J. (2005), "Modeling Association among Demographic Param- 
eters in Analysis of Open Population Capture-Recapture Data," Biometrics, 61, 46 - 
54. 



21 



- (2006), "Model Weights and the Foundations of Multimodel Inference," Ecology, 87, 
2626-2635. 

- (2010), Bayesian Inference with ecological applications, Academic Press. 

Lunn, D., Thomas, A., Best, N., and Spiegelhalter, D. (2000), "WinBUGS - a Bayesian 
modelling framework: concepts, structure, and extensibility," Statistics and Computing, 
10, 325-337. 

McDonald, T. and Amstrup, S. (2001), "Estimation of population size using open capture- 
recapture models," Journal of Agricultural, Biological, and Environmental Statistics, 
206-220. 

Nichols, J., Sauer, J., Pollock, K., and Hestbeck, J. (1992), "Estimating transition prob- 
abilities for stage-based population projection matrices using capture-recapture data," 
Ecology, 73, 306-312. 

Nichols, J. D., Kendall, W. L., Hines, J. E., and Spendelow, J. A. (2004), "Estimation of 
Sex-Specific Survival from Capture-Recapture Data when Sex in not Always Known," 
Ecology, 85, 3192-3201. 

Plummer, M. (2003), "JAGS: A program for analysis of Bayesian graphical models us- 
ing Gibbs sampling," in Proceedings of the 3rd International Workshop on Distributed 
Statistical Computing, March, pp. 20-22. 

Pollock, K. H. (1982), "A Capture- Recapture Design Robust to Unequal Probability of 
Capture," Journal of Wildlife Management, 46, 752 - 757. 

Pollock, K. H., Hines, J. E., and Nichols, J. D. (1985), "Goodness-of-Fit Tests for Open 
Capture-Recapture Models," Biometrics, 41, 399-410. 



22 



Pradel, R. (1996), "Utilization of Capture-Mark- Recapture for the Study of Recruitment 
and Population Growth Rate," Biometrics, 52, 703-709. 

Pradel, R., Wintrebert, C, and Gimenez, O. (2003), "A proposal for a goodness-of-flt test 
to the Arnason-Schwarz Multisite capture- recapture model." Biometrics, 59, 43-53. 

Royle, J. (2009), "Analysis of capture- recapture models with individual covariates using 
data augmentation," Biometrics, 65, 267-274. 

Royle, J. and Dorazio, R. (2008), Hierarchical modeling and inference in ecology: the 
analysis of data from populations, metapopulations and communities, Academic Press. 

Royle, J. A., Dorazio, R. M., and Link, W. A. (2007), "Analysis of multinomial models 
with unknown index using data augmentation," Journal of Computational and Graphical 
Statistics, 16, 67-85. 

Rubin, D. B. (1976), "Inference and missing data," Biometrika, 63, 581-592. 

Schofield, M. R. (2007), "Hierarchical Capture- Recapture Models," Ph.D. thesis, University 
of Otago. 

Schofield, M. R. and Barker, R. J. (2008), "A unified capture-recapture framework," Jour- 
nal of Agricultural, Biological and Environmental Statistics, 13, 458-477. 

- (2009), "A Further Step Toward the Mother-of-all-Models: Flexibility and Functionality 
in the Modeling of Capture-Recapture Data," in Modeling Demographic Processes in 
Marked Populations, eds. Thomson, D. L., Cooch, E. G., and Conroy, M. J., Springer, 
vol. 3 of Environmental and Ecological Statistics, pp. 677-689. 

Schofield, M. R., Barker, R. J., and MacKenzie, D. I. (2009), "Flexible hierarchical mark- 
recapture modeling for open populations using WinBUGS," Environmental and Ecolog- 
ical Statistics, 16, 369-387. 

23 



Schwarz, C. J. and Arnason, A. N. (1996), "A General Methodology for the Analysis of 
Capture-Recapture Experiments in Open Populations," Biometrics, 52, 860-873. 

Schwarz, C. J., Schweigert, J. F., and Arnason, A. N. (1993), "Estimating migration rates 
using tag-recovery data," Biometrics, 49, 177-193. 

Su, Y.-S. and Yajima, M. (2009), R2jags: A package for running jags from R, R package 
version 0.01-26. 

Tanner, M. A. and Wong, W. H. (1987), "The Calculation of Posterior Distributions by 
Data Augmentation (with discussion)," Journal of the American Statistical Association, 
82, 529-550. 

Williams, B., Nichols, J., and Conroy, M. (2002), Analysis and management of animal 
populations., Academic Press. 

Wright, J., Barker, R., Schofield, M., Frantz, A., Byrom, A., and Gleeson, D. (2009), 
"Incorporating genotype uncertainty into mark-recapture-type models for estimating 
abundance using DNA samples," Biometrics, 65, 833-840. 



24 



-1 1 2 3 o 1 2 3 4 



S: Intercept 




I 1 1 1 T 

1 2 3 4 5 

Sampling Period 



Figure 1: Parameter estimates for the meadow vole example. The point gives the median 
of the marginal posterior distribution and the lines represent the central 50% and 95% 
credible intervals. In all plots, black is the model with censoring and blue is the model 
without censoring. 
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Figure 2: Parameter estimates for the house finch example. The point gives the median 
of the marginal posterior distribution and the lines represent the central 50% and 95% 
credible intervals. 
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